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Q\ . Abstract. 

Radial velocity surveys are examined in terms of eigenmode analysis 
within the framework of CDM-like family of models. Rich surveys such 
P-f as MARK III and SFI, which consist of more than 10 3 radial velocities, 

^ . are found to have a few tens of modes that are not noise dominated. Poor 

surveys, which have only a few tens of radial velocities, are noise domi- 

■ nated across the eigenmode spectrum. In particular, the bulk velocity of 

such surveys has been found to be dominated by the more noisy modes. 
The MARK III and SFI are well fitted by a tilted flat CDM model found 
by a maximum likelihood analysis and a x 2 statistics . However, a mode- 

■ by-mode inspection shows that a substantial fraction of the modes lie 

outside the 90% confidence level. This implies that although globally the 

■ CDM-like family of models seems to be consistent with radial velocity 
Q\ \ surveys, in detail it does not. This might indicate a need for a revised 
0^ ■ power spectrum or for some non-trivial biasing scheme. 
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The statistical analysis of surveys of radial velocities plays a major role in the 
study of the large scale structure. Broadly speaking the analysis focuses on 
the estimation of the cosmological parameters and the reconstruction of the 
. local cosmography. Radial velocities surveys are dominated by incomplete and 

anisotropic sky coverage, inhomogeneous, sparse sampling and distance measure- 
ments errors that are often larger than the individual velocities. Such surveys do 
not easily yield themselves to a statistical analysis and provide a real challenge 
for a proper analysis. The history of the field is therefore rich with controversies 
about the interpretation and cosmological implications of different surveys and 
seemingly conflicting results. 

A prime goal of a statistical analysis is to provide a tool for confronting the- 
oretical models with the data. This calls for a formalism for presenting models 
and data in the same language, a problem closely related to the problem of the 
functional representation. From the theoretical point of view, the choice of the 
representation is dictated by the symmetries of the theory. The cosmological 
principle makes the Fourier plane waves and the spherical Bessel/harmonics the 
natural choice. However typical astronomical observations break these symme- 
tries, as the data is neither homogeneous nor isotropic. A better representation 
should reflect both the basic underlying theory and the particularities of the 
data. Here we follow the standard eigenmode analysis, also known as principal 
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components analysis (PCA) and the Karhunen-Loeve transform. This, or its 
slightly modified version of signal-to-noise eigenmodes, was suggested before as 
a method of analyzing redshift surveys (Vogeley and Szalay 1996 and references 
therein). Previous applications focused mostly on parameters estimation . Here 
we extend the method and use it as a general tool for understanding the nature 
of the data, its noise structure and information content. Dealing with velocity 
surveys the determination of bulk velocities of a given survey and its relation to 
the underlying velocity field is revisited. The PCA is used here to address the 
problem of the power spectrum determination. 

2. Eigenmode Analysis: Radial Velocities and Bulk Velocities 

Consider a data base of radial velocities {iii}i=i,...,7v, where 



v is the three dimensional velocity, is the position of the i-th data point and 
€i is the statistical error associated with the i-th radial velocity. The assumption 
made here is of a cosmological model that well describes the data, that systematic 
errors have been properly dealt with and that the statistical errors are well 
understood. The data auto-covariance matrix is then written as: 



(Here \- ■ ■) denotes an ensemble average.) The last term is the error covariance 

matrix. The velocity covariance tensor that enters this equation was derived by 
Gorski (1988, see also Zaroubi, Hoffman and Dekel 1999) and it depends on the 
power spectrum and the cosmological parameters. 

The eigenmodes of the data covariance matrix provides a natural represen- 
tation of the data: 



values Aj are arranged in decreasing order. A new representation of the data is 
given by: 




(1) 







di = rj'j Uj 

This provides a statistical orthogonal representation, namely: 



(4) 




(5) 



The normalized transformed variables are defined here by: 




(6) 



Eq. H is written now as: 




(7) 
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Note that as the modes are statistically independent one can measure the \ 2 of 
a given mode, independently of all other modes: 

X 2 = a 2 (8) 

For normally distributed errors and a Gaussian random velocity field the a^s 
are normally distributed with zero mean and a variance of unity. 

Velocity surveys are often analyzed in terms of their bulk flows, namely 
fitting the velocity field by a single constant velocity vector, ignoring any possible 
correlations of the underlying field. There are a variety of ways of defining the 
bulk velocity of a given survey and here we adopt the Kaiser (1988) algorithm 
which evaluates an error weighted bulk velocity. Thus, the full complexity of the 
underlying field of its N degrees of freedom is compressed into three parameters 
only. It is often argued that this data compression enables the extraction of more 
statistically robust quantities from the data, thus providing better constraints 
on the models. The bulk velocity properties of a survey is studied here within 
the PCA formalism. 

The bulk velocity (B) of a survey is defined by means of a linear operator 
(L), B = Lu (see Kaiser 1988 for the formal derivation). B is expanded here by 

B = ^a i B«=^a lV ^W i ), (9) 

i i 

where BW is the bulk velocity associated with the i-th mode. The bulk velocity 
covariance matrix is: 

(B a Bp) = Y,B$Bf (10) 

i 

In the case of an anisotropic sampling the bulk velocity covariance matrix is 
anisotropic as well. In the limit of a perfect survey (isotropic, no errors, dense, 
homogeneous) the eigenvectors are the spherical harmonics and Bessel functions. 
In such a case and assuming the data to lie on a thin shell, one expects: 

=0 fori/ 2, 3, 4 (11) 

3. Observations 

The problem to be addressed here is the quality and expected significance of a 
survey given an assumed model, i.e. power spectrum, of the underlying velocity 
field. It follows that here one is more interested in the sampling, sky coverage 
and the errors then in the actual numerical value of the data points. Four data 
sets are studied here: The MARK III catalog (Willick et al. 1995), SFI (da 
Costa et al. 1996), LP10K (Willick 1999) and the nearby Type la supernova 
(hereafter SN; Riess 1999) . The first two data sets consist of more than 1000 
radial velocities, and are considered as rich catalogs, where the other two have 
velocities of only 15 Abell clusters (LP10K) and 44 Type la supernova (SN) and 
are considered here as poor catalogs. The analysis has been applied to a wide 
range of CDM-like models but only two models are explicitly presented here. 
One is a flat tilted CDM model (n = 0.8, h = 0.75, f2o = 1, where n is the 
power index, h is Hubble's constant in units of WOMpc^ 1 kms" 1 and S7q is the 
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Figure 1. The eigenvalue spectrum is plotted against the mode num- 
ber for the MARK III (left) and SFI (right) data. The solid line is the 
spectrum of the noise-free covariance matrix and the dashed line corre- 
sponds to the full (signal+noise) covariance matrix. This is calculated 
for the tilted-CDM model. 

density parameter), and the flat A-CDM model with n = 1, h = 1, Qq = 0.4. 
Both models are COBE normalized. The tilted model is the most probable 
CDM-like model for the MARK III data (Zaroubi et al. 1997), and is very close 
to the model favored by the SFI (Freudling et al. 1999). The other model is 
the currently most popular model obeying the age and geometrical constraints. 
The results obtained for the two models are basically very similar. The strategy 
followed here is to compute the eigenmodes and eigenvalues of a given survey and 
model with and without the noise. The comparison of these reveals how many 
independent modes are signal or noise dominated. It can also help assessing the 
degree to which the bulk velocity of the sample reflects the underlying velocity 
field or the observational errors. 

For all data sets the noise-free eigenmode spectrum follows an approximate 
power law behavior over most of the range of modes. The addition of noise 
breaks this power law decline, and causes a flattening of the spectrum. The 
transition from one regime to the other marks the transition from the signal to 
noise dominated regimes. There is a striking difference between the rich surveys 
(MARK III and SFI, Fig. 1) and the poor surveys (SN and LP10K, Fig. 2). 
The formers exhibit a clear break, with some 10 modes or so that are virtually 
unaffected by the noise and a few tens of modes that are not noise dominated. 
In the poor samples, on the other hand, all modes are noise dominated! This 
happens for a wide spread of acceptable CDM-like models, both with COBE 
and clusters normalization. 

Next, the bulk velocities of the spectrum of eigenvalues, is calculated. 
For an ideal survey only the first few modes are expected to be significant and 
the rest of the modes should have very small bulk velocities. MARK III and 
SFI indeed show such a behavior, namely the of the first few modes lie 
significantly above the noise level (Fig. 3). For the poor samples, SN and 
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Figure 2. The eigenvalue spectrum is plotted against the mode num- 
ber for the SNI (left) and LPIOK (right) data. This is calculated for 
the \ambda-CDM model. Same notations as in Fig. 1. 



LPIOK, an opposite trend is found as B l does not decline, or even grows, with 
the mode number, namely the more dominant by the noise a mode is the higher 
is its B' 1 (Fig. 4). It follows that the sample bulk velocity of the rich surveys 
indeed reflects the underlying velocity field (convolved with the sample window 
function). In the case of the poor samples the bulk velocity is dominated by the 
noise. It should be stated that the strong conclusions expressed here are valid 
only within the framework of CDM-like comsogonies. 



4. Power Spectrum 

The optimal way of estimating the values of the cosmological parameters from a 
given survey is by performing a maximum likelihood analysis of the data given 
a range of models (Zaroubi et al. 1995). The maximum likelihood analysis does 
not provide an absolute measure of the quality of the parameter estimation, but 
rather finds the most probable model given the data and the assumed parameter 
space. A measure of the goodness of fit is provided by the x 2 /d.o.f . However, 
it is often the case that the best model of a given parameter space and a given 
data set fits poorly on some scales and better on some others, small vs large 
scales say. This can result in a 'conspiracy', yielding an adequate \ 2 even for the 
'wrong' model. The PCA which projects the data into statistically independent 
normal variables enables the analysis of the data on a mode by mode basis and 
check the goodness of fit across the spectrum of the modes. One recalls here 
that the x 2 °f a given mode is simply a 2 and the cumulative x 2 is 

V M a 2 

X 2 M = (12) 

This combined PCA and \ 2 analysis is applied here to the SFI and MARK 
III catalogs. The smaller data sets (SN and LPIOK) are not discriminative 
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Figure 3. The bulk velocities spectrum is plotted against the mode 
number for the MARK III (left) and SFI (right) data. The crosses 
correspond to the spectrum of the signal covariance matrix and the 
squares to the full (signal+noise) covariance matrix. This is calculated 
for the tilted-CDM model. 
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Figure 4. The bulk velocities spectrum is plotted against the mode 
number for the SN (left) and LP10K (right) data. This is calculated 
for the A-CDM model. Same notations as in Fig. 3. 
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Figure 5. Cumulative x 2 analysis of the MARK III (left) and SFI 
(right) data for a flat tilted (n = 0.8) CDM model. The upper and 
lower 5% and 95% confidence levels are plotted for reference. 



enough to enable such a study.) Fig. 5 shows the cumulative Xm °f the MARK 
III and SFI surveys (assuming the tilted-CDM model). On top of these plots 
the lower and upper 90% confidence levels are plotted. For the the MARK III 
the total x 2 '/d.o.f = 1.02 is well within the 90% limit and therefore seems to 
be very consistent with the data. However, Fig. 5 shows that over most of the 
mode number range the Xm nes outside the 90% confidence band. Actually from 
approximately the 100th mode to the last one there is a monotonic increase of 
Xm • A similar trend is also exhibited by the SFI data. To check the constraining 
power of the PC A/% 2 test it has been applied to the mock MARK III catalog of 
Kolatt et al. (1996). The cumulative x\t * s found to be fully consistent with the 
assumed model (figures are not shown here). Thus, a systematic inconsistency 
of the best CDM-like model with the data is found here that possibly suggests 
a fundamental problem to the CDM paradigm. A more detailed analysis is to 
be presented elsewhere. 



5. Discussion 



The analysis presented here consists of two parts. First, the constraining power 
of radial velocities surveys has been examined within the CDM-like family of 
models. Using PCA the structure of the expected data has been considered, 
rather then the actual numerical value of the data. The analysis reveals that 
rich surveys such as MARK III and SFI, have a few tens of modes that are 
not noise dominated, and hence are expected to reflect the underlying velocity 
field. Poor samples such as LP10K, SN and most probably all other surveys that 
consist of a few tens of objects are noise dominated. Not even a single eigenmode 
is signal dominated for such surveys, and the bulk velocity is dominated by the 
more noisy, and less significant, modes. This does not imply that such surveys 
are of no use in cosmology, but that they should be analyzed with great care. 
Direct reconstruction methods might be completely noise dominated and might 
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be very misleading. Indirect methods such as Wiener filtering and maximum 
entropy should be useful in analyzing such data. A note of cautious is due here. 
The statements made here are valid only within the framework of the standard 
cosmogony of CDM-like family of models. 

Having convinced ourselves that surveys such as MARK III and SFI are 
powerful enough to constrain the CDM-like models, the consistency of these 
surveys with the models has been examined in detail. A mode-by-mode inspec- 
tion finds significant discrepancies with the spectral behavior predicted by the 
'best' model found by the maximum likelihood analysis and a global x 2 analysis. 
It seems that the overall agreement is obtained by some 'conspiracy', where the 
combination of the independent modes yields a reasonable x 2 - This implies a 
gross disagreement of the most favorable cosmological model with the velocity 
data, or the need to invoke some non-trivial biasing. 

PCA can also play a very important role in designing and planning new 
surveys. PCA is based on analyzing the data covariance matrix, which expresses 
the statistical properties of the data rather then its actual numerical values. It 
follows that it can be applied before a survey is done, and therefore can be used 
to design it. By studying the spectrum and structure of the eigenmodes of a 
survey of given geometry and depth and expected errors the constraining power 
of a survey can be properly evaluated in its planning phase. 
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Abstract. 

Radial velocity surveys are examined in terms of eigenmode analysis 
within the framework of CDM-like family of models. Rich surveys such 
as MARK III and SFI, which consist of more than 10 3 radial velocities, 
are found to have a few tens of modes that are not noise dominated. Poor 
surveys, which have only a few tens of radial velocities, are noise domi- 
nated across the eigenmode spectrum. In particular, the bulk velocity of 
such surveys has been found to be dominated by the more noisy modes. 
The MARK III and SFI are well fitted by a tilted flat CDM model found 
by a maximum likelihood analysis and a x 2 statistics . However, a mode- 
by-mode inspection shows that a substantial fraction of the modes lie 
outside the 90% confidence level. This implies that although globally the 
CDM-like family of models seems to be consistent with radial velocity 
surveys, in detail it does not. This might indicate a need for a revised 
power spectrum or for some non-trivial biasing scheme. 



1. Introduction 

The statistical analysis of surveys of radial velocities plays a major role in the 
study of the large scale structure. Broadly speaking the analysis focuses on 
the estimation of the cosmological parameters and the reconstruction of the 
local cosmography. Radial velocities surveys are dominated by incomplete and 
anisotropic sky coverage, inhomogeneous, sparse sampling and distance measure- 
ments errors that are often larger than the individual velocities. Such surveys do 
not easily yield themselves to a statistical analysis and provide a real challenge 
for a proper analysis. The history of the field is therefore rich with controversies 
about the interpretation and cosmological implications of different surveys and 
seemingly conflicting results. 

A prime goal of a statistical analysis is to provide a tool for confronting the- 
oretical models with the data. This calls for a formalism for presenting models 
and data in the same language, a problem closely related to the problem of the 
functional representation. From the theoretical point of view, the choice of the 
representation is dictated by the symmetries of the theory. The cosmological 
principle makes the Fourier plane waves and the spherical Bessel/harmonics the 
natural choice. However typical astronomical observations break these symme- 
tries, as the data is neither homogeneous nor isotropic. A better representation 
should reflect both the basic underlying theory and the particularities of the 
data. Here we follow the standard eigenmode analysis, also known as principal 
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components analysis (PCA) and the Karhunen-Loeve transform. This, or its 
slightly modified version of signal-to-noise eigenmodes, was suggested before as 
a method of analyzing redshift surveys (Vogeley and Szalay 1996 and references 
therein). Previous applications focused mostly on parameters estimation . Here 
we extend the method and use it as a general tool for understanding the nature 
of the data, its noise structure and information content. Dealing with velocity 
surveys the determination of bulk velocities of a given survey and its relation to 
the underlying velocity field is revisited. The PCA is used here to address the 
problem of the power spectrum determination. 



2. Eigenmode Analysis: Radial Velocities and Bulk Velocities 

Consider a data base of radial velocities {tii}i=i,...,jv> where 

Ui = v(r 8 ) • r; + €i, (1) 

v is the three dimensional velocity, r 4 - is the position of the i-th data point and 
ti is the statistical error associated with the i-th radial velocity. The assumption 
made here is of a cosmological model that well describes the data, that systematic 
errors have been properly dealt with and that the statistical errors are well 
understood. The data auto-covariance matrix is then written as: 

Rij = (uiUj} = ^(v(r 8 )v( rj ))^ + 0%. (2) 

(Here ^. . denotes an ensemble average.) The last term is the error covariance 
matrix. The velocity covariance tensor that enters this equation was derived by 
Gorski (1988, see also Zaroubi, Hoffman and Dekel 1999) and it depends on the 
power spectrum and the cosmological parameters. 

The eigenmodes of the data covariance matrix provides a natural represen- 
tation of the data: 

Rr]W= A.V*') (3) 

The set of N eigenmodes {rj^} constitutes an orthonormal basis and the eigen- 
values A j- are arranged in decreasing order. A new representation of the data is 
given by: 

a t = rjf u i ( 4 ) 
This provides a statistical orthogonal representation, namely: 

{cbiCij) = X t S tJ (5) 

The normalized transformed variables are defined here by: 

(6) 



Eq. 5 is written now as: 



Sij (7) 
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Note that as the modes are statistically independent one can measure the \ 2 °f 
a given mode, independently of all other modes: 

xl = a 2 (8) 

For normally distributed errors and a Gaussian random velocity field the a 8 's 
are normally distributed with zero mean and a variance of unity. 

Velocity surveys are often analyzed in terms of their bulk flows, namely 
fitting the velocity field by a single constant velocity vector, ignoring any possible 
correlations of the underlying field. There are a variety of ways of defining the 
bulk velocity of a given survey and here we adopt the Kaiser (1988) algorithm 
which evaluates an error weighted bulk velocity. Thus, the full complexity of the 
underlying field of its N degrees of freedom is compressed into three parameters 
only. It is often argued that this data compression enables the extraction of more 
statistically robust quantities from the data, thus providing better constraints 
on the models. The bulk velocity properties of a survey is studied here within 
the PCA formalism. 

The bulk velocity (B) of a survey is defined by means of a linear operator 
(L), B = Lu (see Kaiser 1988 for the formal derivation). B is expanded here by 

b = ;>> 8 b« = ;>>Va~W°, (9) 

i i 

where B^ is the bulk velocity associated with the i-th mode. The bulk velocity 
covariance matrix is: 

(B a B p ) = J2B^B^ (10) 

i 

In the case of an anisotropic sampling the bulk velocity covariance matrix is 
anisotropic as well. In the limit of a perfect survey (isotropic, no errors, dense, 
homogeneous) the eigenvectors are the spherical harmonics and Bessel functions. 
In such a case and assuming the data to lie on a thin shell, one expects: 

flW = for i / 2,3,4 (11) 

3. Observations 

The problem to be addressed here is the quality and expected significance of a 
survey given an assumed model, i.e. power spectrum, of the underlying velocity 
field. It follows that here one is more interested in the sampling, sky coverage 
and the errors then in the actual numerical value of the data points. Four data 
sets are studied here: The MARK III catalog (Willick et al. 1995), SFI (da 
Costa et al. 1996), LP10K (Willick 1999) and the nearby Type la supernova 
(hereafter SN; Riess 1999) . The first two data sets consist of more than 1000 
radial velocities, and are considered as rich catalogs, where the other two have 
velocities of only 15 Abell clusters (LP10K) and 44 Type la supernova (SN) and 
are considered here as poor catalogs. The analysis has been applied to a wide 
range of CDM-like models but only two models are explicitly presented here. 
One is a flat tilted CDM model (n = 0.8, h = 0.75, S7o = 1> where n is the 
power index, h is Hubble's constant in units of 100Mpc~ 1 kms~ 1 and S7q is the 
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Figure 1. The eigenvalue spectrum is plotted against the mode num- 
ber for the MARK III (left) and SFI (right) data. The solid line is the 
spectrum of the noise-free covariance matrix and the dashed line corre- 
sponds to the full (signal+noise) covariance matrix. This is calculated 
for the tilted-CDM model. 

density parameter), and the flat A-CDM model with n = 1, h = 1, S7o = 0-4. 
Both models are COBE normalized. The tilted model is the most probable 
CDM-like model for the MARK III data (Zaroubi et al. 1997), and is very close 
to the model favored by the SFI (Freudling et al. 1999). The other model is 
the currently most popular model obeying the age and geometrical constraints. 
The results obtained for the two models are basically very similar. The strategy 
followed here is to compute the eigenmodes and eigenvalues of a given survey and 
model with and without the noise. The comparison of these reveals how many 
independent modes are signal or noise dominated. It can also help assessing the 
degree to which the bulk velocity of the sample reflects the underlying velocity 
field or the observational errors. 

For all data sets the noise-free eigenmode spectrum follows an approximate 
power law behavior over most of the range of modes. The addition of noise 
breaks this power law decline, and causes a flattening of the spectrum. The 
transition from one regime to the other marks the transition from the signal to 
noise dominated regimes. There is a striking difference between the rich surveys 
(MARK III and SFI, Fig. 1) and the poor surveys (SN and LP10K, Fig. 2). 
The formers exhibit a clear break, with some 10 modes or so that are virtually 
unaffected by the noise and a few tens of modes that are not noise dominated. 
In the poor samples, on the other hand, all modes are noise dominated! This 
happens for a wide spread of acceptable CDM-like models, both with COBE 
and clusters normalization. 

Next, the bulk velocities of the spectrum of eigenvalues, is calculated. 

For an ideal survey only the first few modes are expected to be significant and 
the rest of the modes should have very small bulk velocities. MARK III and 
SFI indeed show such a behavior, namely the B^ of the first few modes lie 
significantly above the noise level (Fig. 3). For the poor samples, SN and 




Figure 2. The eigenvalue spectrum is plotted against the mode num- 
ber for the SNI (left) and LP10K (right) data. This is calculated for 
the \ambda-CT>yi model. Same notations as in Fig. 1. 

LP10K, an opposite trend is found as B l does not decline, or even grows, with 
the mode number, namely the more dominant by the noise a mode is the higher 
is its B l (Fig. 4). It follows that the sample bulk velocity of the rich surveys 
indeed reflects the underlying velocity field (convolved with the sample window 
function). In the case of the poor samples the bulk velocity is dominated by the 
noise. It should be stated that the strong conclusions expressed here are valid 
only within the framework of CDM-like comsogonies. 



4. Power Spectrum 

The optimal way of estimating the values of the cosmological parameters from a 
given survey is by performing a maximum likelihood analysis of the data given 
a range of models (Zaroubi et al. 1995). The maximum likelihood analysis does 
not provide an absolute measure of the quality of the parameter estimation, but 
rather finds the most probable model given the data and the assumed parameter 
space. A measure of the goodness of fit is provided by the x 2 /d.o.f. However, 
it is often the case that the best model of a given parameter space and a given 
data set fits poorly on some scales and better on some others, small vs large 
scales say. This can result in a 'conspiracy', yielding an adequate x 2 even for the 
'wrong' model. The PCA which projects the data into statistically independent 
normal variables enables the analysis of the data on a mode by mode basis and 
check the goodness of fit across the spectrum of the modes. One recalls here 
that the x 2 °f a given mode is simply af and the cumulative x 2 is 

T M a 2 

X 2 M = ^§^- (12) 

This combined PCA and x 2 analysis is applied here to the SFI and MARK 
III catalogs. The smaller data sets (SN and LP10K) are not discriminative 
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Figure 3. The bulk velocities spectrum is plotted against the mode 
number for the MARK III (left) and SFI (right) data. The crosses 
correspond to the spectrum of the signal covariance matrix and the 
squares to the full (signal+noise) covariance matrix. This is calculated 
for the tilted-CDM model. 
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Figure 4. The bulk velocities spectrum is plotted against the mode 
number for the SN (left) and LP10K (right) data. This is calculated 
for the A-CDM model. Same notations as in Fig. 3. 
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Figure 5. Cumulative x 2 analysis of the MARK III (left) and SFI 
(right) data for a flat tilted (n = 0.8) CDM model. The upper and 
lower 5% and 95% confidence levels are plotted for reference. 

enough to enable such a study.) Fig. 5 shows the cumulative x\i °f th e MARK 
III and SFI surveys (assuming the tilted-CDM model). On top of these plots 
the lower and upper 90% confidence levels are plotted. For the the MARK III 
the total x 2 /d.o.f = 1.02 is well within the 90% limit and therefore seems to 
be very consistent with the data. However, Fig. 5 shows that over most of the 
mode number range the \\i ^ es outside the 90% confidence band. Actually from 
approximately the 100th mode to the last one there is a monotonic increase of 
Xm- A similar trend is also exhibited by the SFI data. To check the constraining 
power of the PCA/x 2 test it has been applied to the mock MARK III catalog of 
Kolatt et al. (1996). The cumulative x\i ls f° un( l to be fully consistent with the 
assumed model (figures are not shown here). Thus, a systematic inconsistency 
of the best CDM-like model with the data is found here that possibly suggests 
a fundamental problem to the CDM paradigm. A more detailed analysis is to 
be presented elsewhere. 

5. Discussion 

The analysis presented here consists of two parts. First, the constraining power 
of radial velocities surveys has been examined within the CDM-like family of 
models. Using PCA the structure of the expected data has been considered, 
rather then the actual numerical value of the data. The analysis reveals that 
rich surveys such as MARK III and SFI, have a few tens of modes that are 
not noise dominated, and hence are expected to reflect the underlying velocity 
field. Poor samples such as LP10K, SN and most probably all other surveys that 
consist of a few tens of objects are noise dominated. Not even a single eigenmode 
is signal dominated for such surveys, and the bulk velocity is dominated by the 
more noisy, and less significant, modes. This does not imply that such surveys 
are of no use in cosmology, but that they should be analyzed with great care. 
Direct reconstruction methods might be completely noise dominated and might 
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be very misleading. Indirect methods such as Wiener filtering and maximum 
entropy should be useful in analyzing such data. A note of cautious is due here. 
The statements made here are valid only within the framework of the standard 
cosmogony of CDM-like family of models. 

Having convinced ourselves that surveys such as MARK III and SFI are 
powerful enough to constrain the CDM-like models, the consistency of these 
surveys with the models has been examined in detail. A mode-by-mode inspec- 
tion finds significant discrepancies with the spectral behavior predicted by the 
'best' model found by the maximum likelihood analysis and a global x 2 analysis. 
It seems that the overall agreement is obtained by some 'conspiracy', where the 
combination of the independent modes yields a reasonable \ 2 ■ This implies a 
gross disagreement of the most favorable cosmological model with the velocity 
data, or the need to invoke some non-trivial biasing. 

PCA can also play a very important role in designing and planning new 
surveys. PCA is based on analyzing the data covariance matrix, which expresses 
the statistical properties of the data rather then its actual numerical values. It 
follows that it can be applied before a survey is done, and therefore can be used 
to design it. By studying the spectrum and structure of the eigenmodes of a 
survey of given geometry and depth and expected errors the constraining power 
of a survey can be properly evaluated in its planning phase. 
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